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PROBLEM TO BE SOLVED: To speed up a 
communication between processors by comparing 
parameters based upon the execution of an 
execution-time code with parameters stored in a work 
area, and making the processors communicate with each 
other by reusing a communication pattern or memory 
access pattern according to the comparison result. 

SOLUTION: It is analyzed whether or not a 
communication pattern when a loop nest is executed can 
be cached (step 11) and a plurality of parameters 
determining the communication pattern between the 
processors is extracted (steps 12 and 13). Then the 
current parameters in the work area and old parameters 
stored in the work area are compared with each other 
(step 14), it is checked whether or not cached data are 
the same according to the comparison result (step 15), 
and the communication pattern or memory access 
pattern is reused to carry out the communication 
between the processors (steps 16-18). Consequently, 
the communication between the processors can be 
speeded up. 
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Japanese Patent: Laid-open No. Hei 9-6263 9 
[Abstract] (with modification) 

[Problem] To carry out high-speed communication between 
processors by using communication history in the past and 
omitting calculation of a communication pattern. 
[Constitution] In a parallel computer, there are a step of 
extracting a plurality of parameters determining a 
communication pattern between processors, a step of preparing 
a work area for storing the parameters, a step of generating 
an execution-time code for storing the parameters in the work 
area, a step of storing the parameters in the work area for 
the purpose of storing the history of the communication 
pattern or the history of a memory access pattern for reading 
and writing data for communication between the processors, and 
a step of, in the case where the execution-time code is 
executed , comparing the parameters based on the execution of 
the execution-time code with the parameters stored in the work 
area. According to the comparison result, the communication 
pattern or the memory access pattern is reused to execute 
communication between the processors. 

[Scope of Claims for Patent] 



- 1 - 




[Claim 1] A method of executing communication between 
processors in a computer having a plurality of processors 
characterized by comprising the steps of: 

(a) extracting a plurality of parameters determining a 
communication pattern between said processors; 

(b) preparing a work area for storing said parameters; 

(c) generating an execution-time code for storing said 
parameters in said work area; 

(d) storing said parameters in said work area for the 
purpose of storing the history of said communication pattern 
or the history of a memory access pattern for reading and 
writing data for communication between said processors; 

(e) in the case where said execution- time code is 
executed r comparing said parameters based on the execution of 
said execution-time code with said parameters stored in said 
work area; and 

(f) according to the comparison result , reusing said 
communication pattern or said memory access pattern to execute 
communication between said processors. 

[Claim 2] A method as claimed in claim 1, characterized in 
that extraction of said parameters in the above step (a) is 
carried out by checking whether the values remain constant or 
change during execution of a loop nest and by judging whether 
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said communication pattern is chacheable or not. 
[Claim 3] A method as claimed in claim 1, characterized in 
that the above step (e) is comparing said parameters based on 
the execution of said execution-time code with said parameters 
with regard to said communication pattern stored in said work 
area to check whether they are the same. 

[Claim 4] A method as claimed in claim 1, characterized in 
that the above step (e) is comparing said parameters based on 
the execution of said execution-time code with said parameters 
with regard to said communication pattern stored in said work 
area to check whether they are the same, and, in the case 
where they are the same, comparing said parameters based on 
the execution of said execution-time code with said parameters 
with regard to said memory access pattern to check whether 
they are the same. 

[Claim 5] A method as claimed in claim 1 or 2 , characterized 
in that said parameters extracted in the above step (a) have 
parameters in an equation for deciding parameters with regard 
to a loop in a program, the number of processors for dividing 
the arrays on the left side and the right side in an 
assignment statement in said loop, the size of the respective 
dimensions of said arrays, the method of dividing said 
respective dimensions of said arrays, and reference of said 
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arrays . 

[Claim 6] A method as claimed in claim 1, characterized in 
that said communication pattern describes an area for 
transferring data from one processor to another processor. 
[Claim 7] A method as claimed in claim 1, characterized in 
that said memory access pattern describes a memory area 
managed by a processor when data is transferred according to 
said communication pattern. 

[Claim 8] A method as claimed in claim 1, characterized in 
that the above step (f) uses said parameters stored in said 
work area. 

[Claim 9] A method of executing communication between 
processors in a computer having a plurality of processors 
characterized by comprising the steps of: 

(a) extracting a plurality of parameters determining a 
communication pattern between said processors; 

(b) preparing a work area for storing said parameters 
for the purpose of storing the history of said communication 
pattern or the history of a memory access pattern; and 

(c) generating an execution-time code for storing said 
parameters in said work area. 

[Claim 10] A method of executing communication between 
processors in a computer having a plurality of processors 
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characterized by comprising the steps of: 

(a) storing in a work area secured in advance a 
plurality of parameters determining a communication pattern 
between said processors; 

(b) in the case where an execution-time code is 
executed, comparing parameters based on the execution of said 
execution- time code with said parameters stored in said work 
area; and 

(c) according to the comparison result, reusing said 
communication pattern or said memory access pattern. 



t 



- 5 - 



(19)H#ffl#f*JT (J P) 0 2)£t PS #B*F^HI (A) <ll)ttftUH&H** 

#i¥ 9 - 6 2 6 3.9 

(43) 'AmB ¥/&9 m (1997) 3 E 7 B 

(soint.ci. 6 »wm* fi a*«^«?R 

G06F 15/163 G06F 15/16 320 K 



w&m$t Mm$t »#:«a>»io ol (^isth) 





HI¥7 


- 2 1 5 6 0 1 


(71)H 


tt«A' 


592073101 














(2Z)aiia 


¥J5&7*£ 


(1995) 8^240 






j|C&i&*ilK;*#*3TB 2# 1 2-^ 








(72)5 




5« 












#3S5J-rj|R*W-rST»ia 1 6 2 3#il4 
































(72)9 




5fHS 












#*JHM*» iUT»ia 1 6 2 3 1 4 












B#7*f • tf- • XA«e5££4l:*3iC*teffi 




















(74>ttSA 


#S± ^B3 m (*2«) 















(54) ire©**] mm»*m<o-fu^y^^M9^& 



(57) («IE*) 
BA7^-^S»ill!it-SXTy^ Rtf^frW 3 




XT*-?! 1 



X? 1 * ?*l 4 



I 



1 



i 



8 



( 

I 

$> v % u -tr v vmv mm & s (c^ ^x * 
^-^^atB-r^x.T-y ^t, (b> mb/^;*-*£« 

Xr^t, (d) aW/l^-XOJBKSfctt^D'fey* 
B/t7^-^€««t4XTy^t» (e) ii*f2l*lfTi$3 

i >*: m e / ^ * - * & » is » « « «c e « s n it m e 

Cr, MEii«^^->XttWE^ : EUT^irX/t ; Sr- 
[«f*«2] ±E-X-ry ^(aXciStt^WE/^^^-^ 

©iiua, :/*x h **ff lt^*rbk:'«[ 

SlfrfcS^fcfflrEA*^ WEff«««twE* 
SttfcStE&W/^-^HrafflE/^*-* <h— £fc 
r5*£5^4Jfc«t5i:tS»!lta:tS»*« 1 KE 

Cli^^4] lEXfyy(e)lt »E*ffWf3-Ha> 

ssfrcs^&ttE^ a»Efe*««fc:E!i 

t^^^tTSi^I 1 {CE^CO^^. 

ME;w-^rt©ftXx + ofi>a2&c;*5zz<DE59*» 

twEfcO^fe. * 

ft©^ nir y iJ-^-r-^ *aW«l/S:-rs««*E3fib 

/tt)^r^^ut$Mfr§i^i 1 tcE^^^^o 

[»*«7] ItE/tUT^t^/^-Xi, it&Eilff 



2 ) 1^i¥9 - 6 2 6 3 9 

2 

1 kehs©**. 
(a) ^□■fey-y-Bgo3i«/i^->&ft3e"r**i»©/^7 

^-^S:aiBtSXf7^i, (b) a«/^->G>JBig 
'J ^X/^ — ><2 ®S£ Bit t* 

(c) ttE/t5*-**i»Bfts*««fc*lftT* 

10 trass. 

(a) :/Oiry tfW©a«/S*->€:ft3£r ***<D/*7 

(b) *frWr3— F*«*fT^*i&»^, ^E^fT^n 
— FCD*!f Jca^^fc/^^^-^ S»Ef^»lH«(rE« 
SnfcllllE/^^-^itfcKtSXTy^t, (c) 
Jttt»*CjsSUic:v,«Effl«/ , «^->Xtt»E^^ U T 

20 

[fgfpf<Dl¥*ffl£:i&^] 
[ 0 0 0 1 3 

*y it roan & §n?T*r 5 s& (c sir * • 

[00023. 

Dir*>it^^ z e , J^W*r^>3>t!^ — ^ ~>X^A 

x-^o^tt^u. rfc*>"fca«a««£r*. 

[ 0 0 0 3 3 ^n-fe^-ttW©a«*i«a<brs*Si:U 
40 U-£^-x.5&8g:£D f (k) ir^o ^-T^X 
^n-feyU-S^itSBlftSC, (i) fc-T*. 

r*Fi, (1) tr-s. z<Dtz, ^^^^ir§^^ 

[0 0 0 4 3 

50 [*1] D, (F n (i) ) =C, (i) 



( 3 ) 



&m¥9 - 6 2 6 3 9 



20 



[0 0 0 5] itV^-f>Ty^^T7^t 

y* y ^ X © ffl-Jf £ff -5 :/n -fc: y It U C .£ £ ^ L T 

[0 0 0 6] a«J&*«£'r*»'&, D. (F,. (i) ) 
d^C, (i) A.|EaiStTO^D-feyU-HOai«^^- 

[ 0 0 0 7 ] HT"Ct*. 7utfy A + <DttA£tC:fctt5 

fesfta&SHftC, (i) «XXOfe3a©13?II<Z)»*l 10 

(Left-Hand-Side) t6ia©B5IRHS (Right-Hand-Sid 

[0 0 0 8] 
[ft 2] 

D lIS (Fiin. (i) ) =0,,,, (F,.i, (i) ) 

[0 0 0 9] CCE>a»4, itl^Ofy^^tT^t 

[0 0 10] ftAXT7^t^Stl5EM©»*ltt. E 

[0011] ^^p: E3fl<Bft*7E&#*l?"*^nii 

[0 0 12] C©J;5fc««fc*5^T\ E^Jx0^9J«: 
ftft-TfcHftD, (k) ^^p, ^^^n&^ 

' i #»'-rsEWK**ft36"*"*Hft 
F M (i) l^^at^^ns, 

[0 0 13] ttoT, ft^«AX**ffT«ISJwi&Kt 40 

(2) fc3a©EW©#*7cs»«'rs^o"fey' 

(3) "fe5aEWa>**7co*trS: 

(4) £3a©E^J©&&7C<D#t!j;£& : 

(5) feS^E^i^M^^^t-^^OA' 5*-*(aI+b) : 

(6) fija^E^^ft^X^^WTS^n-fcy-tf-fl* 

(7) ;&iaE#IG>&*7C<B*£2 : 

(8) *jZZ©E3^©#*7C©»«*a : 

(9) &21<Dmn&m&&il£T2>i&<OK 7>-*(aI+b) : 



30 
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A(I)=A(I-D-HKI)+A(I+1) 
ENDD0 
ENDD0 
END 
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SUBROUTINE SUBCA.N) 

REAL AG© 
1EPFJ PROCESSORS P(4) 
IHPFt DISTRIBUTE CBL0OD onto P :: A 
D0TIHE=1,10 
DO 1=2, N-l 

A(I)=A(I-1)4A(I)+A(I4-1) 
ENDD0 
ENDD0 
END 

[0 0 5 4] C(D7'a^7A^*^iSlil/T3>^ 

[ 0 0 5 5] 
[*6] 

SUBROUTINE SUB(A,N) 
REAL AQ0 

DO TIME=1,10 
w0.array_nb(D=H 
*0. loop ub(l)=N-l 
call CofflputeJ-ISCAd), LIS, wO) 

wLarray_lib(D=H 

call ItoJrefetehJtoimiCLlS, A(I+D, «0 
w2.array_ub(D=N 

call DoJIpelineJtecvCLIS, A (1-1), w2) 

DO I=LIS.1B(D,LIS.UB(1) 
A(I) =A(I-D^(I)+A(I+1> 

ENDD0 

wi array.nb(D=N 

call DoJIpelineJendaiS. AC (1-1), w2) 
ENDD0 

[ 0 0 5 6 3 HTTtt, N=100i:{S^bT 1 #^Diry 

5. Corapnte.LIS-tr, 1 #^D-fe y U"^»»S*fT"S"* 

— ^<>5* y I <D«H&W-Wi"S. ± 
*a>(l)75S<9) * — 5. (1)7!>S(5)<DA^ 

[0 0 5 7 ] 
[*7] 
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(1) JU-^-f ?XV>±Tm : 2.N-1 

(2) fefflOia^J©ft*S*»»JT*^nir yiJ-fiBt : l*5cS 4 

(3) feS^^JcO&*7CO^c^$ : I&tcB N 

(O feja©e^®**S<0#«*« : 1^7C§ BLOCK 

(5) £ffiCDI^J#J&£&£^£^£>A , 7>-*<aI + b) : l*7cl a=l,b=0 . 

[0 0 5 8 ] ftfxttfc^fcTS^fiBtt©**'^*-* ^-r^O^y ^XO«SIB*«-#b, Fortran90<DH^> 

14, K^JO^^^Ni, >-r V #*©±P&(Z>N-1 &^7#?£T^3i-f5 t [2:25:1] tJft*. 

ULrttoBn? *-9\Zftft\,X. Compute_LIS^R?^o Kli. £©-f fc. ±«© <1) »S<9)©/t 

— ^«B*fcfc::i Lfc«Kl* -f >t^v ^X(D$5K£ [ 0 0 6 0 ] 

(6) *^^^^J^€-^^^^f | JT^^nir ; 1*tcS 4 

(8) *3a©E*J©#*7C©#M#ife-: l^Tcg BLOCK 

(9) ;&j2(£@E^J#JI£&^T : 5 j£<&n* 5*-*(aI+b) : 1^7cS a=l,b=-l 

1^7C@ a=l,b=0 

1#Ctc@ a=l,b=l 

ft<Dy ^tfiPl "C 6, TIME = l<D<h££>IB/^*-^ 

O/^.^-^^l/^Ot, TIME = l©-«f«C«ffl Lfe[26: 
26:l]^yniryit 2 -v;u— ^ff tv^ 
SUt/t^-VfcfSfcJST*. TIME = 3*»5TIME = 10 

*T?,TIME=1©<2:€?©»«^ — > «r H *6 ffl "t? 

[0 0 6 6 ] I* <P*^ U 

[0 0 6 7] 
Life 9 J 

SUBROUTINE SUBO,N) 
REAL ACS, 2) 
IHPFI PROCESSORS P(4) 
IHFFS DISTRIBUTE (BLOCK, *) onto P :: A 
K=l 

DO TIME=1,10 
DO 1=2, N-l 

A(I,3-I> = ACl-l f 30+A(1, 04A(I+1,K) 
ENDD0 

K=3-K 
ENDD0 
END 

[0 0 6 8] :o^Dy7Afi*58WSlffll/tn>^ 

©lB/^*-*fc«#LT'j&*&o*.s. ^©tt, fcmm®. so *r;u-rst* KToi^fea-H^sns. 



[oo6i] *tf wica^r* prtittwfes/^^^-^ 20 

©ftAX<o*5aK»n*E^itt3 ctiso 
tc & £ o 

[ 0 0 6 2] Ad-DfCBBt^Mtt. [26: 26 : l] £ 
■fe yV2fr*>)l-7mft\M.'&lZi£m* A(I)». 
Ur^7irX^:cDTafit4^So AdHlCHf *MI4, 30 
[26: 26 :l]*yn-feyU- 2^^— y*tritW(ca*. 

[0 0 6 3] «r>T* A(I + l)fcH-r*A<7* — 

«*iii:ft#l/T, ^-^ftfxiatflfc^-^oSIS** 
fx 3. £ 5KAUM)fcBlir*/S7*-*£fl s li^*w2 f w 

3i:^it, ^-^Sfxi&ttfc^-^SSWL^ jw- 
^ ft fx it « x - * © « '« «r fx 5 o 

[0 0 6 4] ^IC, TI»E = 2©»-&£**.*. TIME = 2<Di§ * 
^(C^^-rfc, TIME=t (elJiHC 100rS§. * ■ 

Compnte.LISTJU— y-f ^X©fB'H &tf 

Conpute_LIS*l*j:iRfc, E*M©*S3N£:, 40 
^^©±IBN-1S, fPSHfilttwoilB/^*-* 

[ 0 0 6 5 ] ^CD&, ^DtytMaa«;^->SIH( 
•T^o SI*. A(i + i)i:(f«Hli:our*^5. do_p 
refetchJIomiiifcPf -SB*:. 1B^J(D^:# $ N£f£3££g*£wl 
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[ 0 0 6 9 ] 

[*1 01 

SUBROUTINE SUBCA.N) 

REAL A 00 
K=l 

DO TIME=1, 10 
wO. array_ob(D=N 
w0.loop.iiba)=K-l 
call Compute _1IS(A(I), LIS, wO) 



( 8 ) ftmW-V - 6 2 6 3 9 

14 

COMPUTEJLIS-C, 1 #^nir ^^^m-^L^^ff f § 
^O^y I ©IBHSW-ff-r*. COtt, ± 
3^<0 (1) (9)0/1^^-^0 o*>* (D.72rS(5)<0/n5 

[0 0 7 1 ] 



▼1, array_nb(D=K 

call DoJ?refetch_ComiaiS, ACI+D, id, D 
w2. array_ub (D=K 

call DoJPIpeliaeJtecvCLIS, A(I-D, w2, D 

DO I=LIS-LB(1>,LIS.UB(D 

A(I,X) =A<I-l,0+Aa©4A(I+l,K> 
ENDDO 

w3, array_ub (D=N 

call DoJipelineJSend(LIS, A((I-1), w2, X) 



K=3-K 
ENDDO 

[ 0 0 7 0 ] ^TT?Ji. N=100£<&3£bT 1 #:/n-fe y 

(1) ;p-y-r >^u/ ^X©-h~FK : 

(2) feffl©i?!I©6*S€»St*5yD-feyU-«i 

(3) £52@B?»JO&#:7C<D;*:££ : 



(4) £5a<De^©**5c©#*l:er8s : 

(5) &m<Dm#mm$:&:7£-?z>3£<ox n-9ui+h) : 



[0 0 

go* 

[0 0 

7^ 



40 



7 2] JU-:7V >x*y *fr 

7 3] ffl&T^fcfnfc^K: ti. ft*«1H*rtG>:7 

(8) SaKOEW^ft^TEfe^tlTS^n-feytJ-H 

(7) ^SE^JO^^tcO^^^ : 

(8) fija©K*J©3-*7cG>4*ftl^ft : 



ran90OZ 
[0 0 7 

[0 0 7 
[ft 1 2 



2,N-1 
: l*7cS 4 

2^7C@ 0 

1#CtgB N 

2#7cS N 

1*7C@ BLOCK 

2#7C E * 

l*5cB a=l,b=0 

2^7C@ a = 0-, b=K 
MitS-f yfv ^XOtKBfetHH/. Fort 
lO^f^-ClitS <h [2:25:1] tteZ* 

4] ^□ir!y*»aflr/ , «^->*«-*T-«&«> 

<D^-fe, (6) 73rS<9)©/*7* — ^35«!&Blw^:. 



5] 



1&7CS 4 

2^7Eg 0 

l^TC @ N 

2^7CS N 

I^tcS BLOCK 



( 
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(9) ;&S<£E^]#iS£&£T£5£on' 



[0 0 7 6] ^Dt:y»|»IHI/t^->l:Mdt*«/1 

/w ;n$ tr L fcff«««wi . w2 © IB * - * fc« 

[ 0 0 7 7 ] A(I-l.K)fc:Wr*JHf fct, [26: 26: 1]*^ 

RDT^-fex&o-ColfittTE. Ad + i.K)K:MTSji« 

tt. [26:26:1] S^airy'^ 2 ^6;U-^aStfiaStt fCS 
[ 0 0 7 8] ftoT, A(Ill) \ZM-?Z>rt? *~ **i$m 

n?. s e tc a c i - 1 ) icwrs/^ ^^-^*ff*«*w2(c 

[ 0 0 7 9 ] ^tC, TME = 2<Z>»£ TIME = 2C0i§ 

^fcSS^Tt), TIME=l©«*<fcEJ«U;:i00-C**. <I 
L,.. Kte 2 KfcoT^*. Compute_LIS"Z?;U — ^ 

-f >"fv$ 7><Dmm&itW-?Z>. ComputeJ.IS£:B¥.Sj& 

Btt, ffJt«*rt077^U©r» TlME = l(D<h#£> 
MX. ±T©/^7/-^^^l/^WT, TIME=10^(Z<^ 



9 ) ' &m¥- 9 - 6 2 6 3 9 
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7*-*(aI+b) : 1^7C@ a=l,b=-l 
2^7C§ a=0,b=K 
1&7C§ a=l.b=0 
2^7cS a=0,b=K 
1^7tg a=l,b=l 
2^7C@ a=0,b=K 
mVfafr-'J'f >r v^X [2:25: 1]&S« JUTS. 

[0080] ^v>&. •?u±y.vmmmrt* — >&n* 

10 -re, *1\ Ad + DKiatSHIC^ViT*^^. Do.P 
refetch.Comm^i^^l^tC, E?iJ<D*:£ $ N £ f££iM^wl 

ft<D77^ TIME=lCOt^CDIB/^^^- 

©/S^*-*#«l/lr>©-C. TIME=lG>B*fc:ttffl Lfc[26: 
26:l]&^D-fey-y-2^;W— ^*fril[«fc2l*, 

l;lt, [2 6 : 26 :l]-*yn-fey!J-2*^;U— ^SSfritfltt 

icsitBts, — >&s«£ffl-r*. fit* 

[ 0 0 8 1 ] Ad-OfcHbTH. [26:26: 1] [2]£ilflx 
— ^£ LTK*&tT. A(I + l)lcHLTtt, [26:26: 13 C23 

[ 0 0 8 2 ] TIME=37&>STI»E=10£T?TIME=1©£ 
30 [0 0 8 3 ] 

[Hffi©«*&«^] 

[B2] #*KMfc43t*3fts*«*£^THT»*. 
[03] y>a irft«P«f«C^TI¥3fibfc7n-^ 

' r — V & 5 . 



( 10 ) 



9 - 6 2 6 3 9 



im i ] 



^7 1/ ^-<r>W8x 




END 



x y- y ^ 1 1 



2 



^.t- v .y i 8 



* 



( 11 ) 



- 6 2 6 3 9 



im2) 



77/ 







































( 12 ) 



®&S¥9 - 6 2 6 3 9 



[IS 31 






* 






r . 




XT-vf'J 1 



* f- y f 3 2 



* 1r y y 3 4 



* 7*3 7 



( 13 ) 



¥9-62639 



7D> h^— v^jRf? 



(72)3g93# /h5EH it* 

#3fcJI|»*j»rfrTttW 1 6 2 3 Sife 1 4 
B*7-f -""if- * XA^^tHISI 



